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Three  Situational  Awareness  Rating  Scales  (SARS)  were  devel¬ 
oped  to  measure  pilot  performance  in  an  operational  fighter 
environment.  These  instruments  rated  situational  awareness 
(SA)  from  three  perspectives:  supervisors,  peers,  and  self-report. 
SARS  data  were  gathered  from  205  mission-ready  USAF  F-1SC 
pilots  from  8  operational  squadrons.  Reliabilities  of  the  SARS 
were  quite  high,  as  measured  by  their  internal  consistency  (0.9S 
to  0.99)  and  inter-rater  agreement  (0.88  to  0.97).  Correlations 
between  the  supervisory  and  peer  SARS  were  strongly  positive 
(0.89  to  0.92),  while  correlations  with  the  self-report  SARS  were 
positive,  but  smaller  (0.45  to  0.57).  A  composite  SA  score  was 
developed  from  the  supervisory  and  peer  SARS  using  a  principal 
components  analysis.  The  resulting  score  was  found  to  be  highly 
related  to  previous  flight  experience  and  current  flight  qualifi¬ 
cation.  A  prediction  aquation  derived  from  available  back¬ 
ground  and  experience  factors  accounted  for  73%  of  its  vari¬ 
ance.  Implications  for  use  of  the  composite  SA  score  as  a  criterion 
measure  are  discussed. 


SITUATIONAL  AWARENESS  (SA)  has  generated 
considerable  interest  within  the  aviation  commu¬ 
nity.  Loss  of  SA  has  been  considered  a  major  contrib¬ 
utory  factor  in  many  military  and  commercial  aviation 
accidents  and  incidents  (18).  The  human  factors  com¬ 
munity  is  pursuing  the  measurement  of  SA  as  a  tool  for 
evaluating  cockpit  design  (14).  There  is  also  interest  in 
SA  within  the  fighter  pilot  community,  since  it  is  con¬ 
sidered  a  key  element  in  determining  success  during  air 
combat  operations  (21). 

In  1991,  the  U.S.  Air  Force  Chief  of  Staff  posed  a 
series  of  questions  concerning  SA  that  led  to  the  present 
investigation.  First  of  all,  what  is  SA?  Can  it  be  objec¬ 
tively  measured?  Is  SA  learned  or  does  it  represent  a 
basic  ability  or  characteristic  that  some  pilots  have  and 
others  do  not?  From  a  research  standpoint,  these  ques- 
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tions  translate  into  issues  of  measurement,  selection, 
and  training.  Armstrong  Laboratory  was  subsequently 
tasked  with  providing  research  answers  to  these  ques¬ 
tions.  A  research  investigation  was  initiated  that  had 
three  goals:  first,  to  develop  and  validate  tools  for  reli¬ 
ably  measuring  SA;  second,  to  identify  basic  cognitive 
and  psychomotor  abilities  that  are  associated  with  pilots 
judged  to  have  good  SA;  and  third,  to  determine  if  SA 
can  be  learned,  and  if  so,  to  identify  areas  where  cost- 
effective  training  tools  might  be  developed  and  em¬ 
ployed. 

The  general  approach  was  first  to  develop  criterion 
measures  of  SA  based  upon  performance  ratings  col¬ 
lected  within  an  operational  flying  environment.  These 
measures  were  necessary  for  two  reasons.  First,  they 
would  serve  as  criterion  measures  against  which  to  val¬ 
idate  a  battery  of  basic  ability  tests  considered  relevant 
to  SA,  thereby  addressing  the  question  of  basic  human 
abilities.  Second,  these  measures  would  serve  as  a 
means  of  selecting  a  sample  of  pilots  who  would  partic¬ 
ipate  in  a  simulation  phase  of  the  effort.  During  that 
phase,  simulated  air  combat  mission  scenarios  were  de¬ 
veloped  for  assessing  SA,  and  objective  measures  of 
performance  gathered  in  an  attempt  to  determine  those 
characteristics  that  distinguish  pilots  with  good  SA. 
These  data  would  be  used  to  identify  areas  where  train¬ 
ing  tools  might  be  developed.  This  article  presents  the 
results  of  only  the  first  phase  of  the  program;  namely, 
the  development  of  tools  for  measuring  SA  within  an 
operational  fighter  environment. 

The  approach  to  developing  measurement  tools  was 
largely  dictated  by  the  definition  of  SA  adopted  at  the 
outset  of  the  study,  the  intended  use  of  the  data,  and 
practical  constraints  involved  in  gathering  data  on  mis¬ 
sion-ready  aircrew.  In  response  to  the  question,  “What 
is  SA?”,  the  Air  Staff  produced  the  following  operator’s 
definition:  “a  pilot’s  continuous  perception  of  self  and 
aircraft  in  relation  to  the  dynamic  environment  of  flight, 
threats,  and  mission,  and  the  ability  to  forecast,  then 
execute  tasks  based  on  that  perception”  (2).  While 
other  definitions  of  SA  within  the  literature  focus  pri- 
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marily  on  processes  underlying  the  assessment  of  the 
situation  (13),  our  working  definition  also  included  fore¬ 
casting,  decision-making,  and  task  execution.  As  such, 
it  was  viewed  as  a  fairly  global  operational  concept  that 
encompasses  much  of  the  domain  of  air  combat  profi¬ 
ciency. 

Since  the  data  were  to  be  used  primarily  as  a  criterion 
against  which  to  determine  relationships  with  basic  abil¬ 
ity  measures,  fairly  large  numbers  of  subjects  would  be 
required.  This  requirement  further  restricted  the  types 
of  measures  to  those  that  could  be  gathered  in  a  quick, 
noninvasive  manner,  since  available  pilot  time  within 
any  operational  flying  environment  is  limited.  While  a 
number  of  measurement  tools  had  been  developed  to 
measure  SA  within  a  highly  controlled,  simulated  flight 
environment  (4,5),  these  did  not  seem  appropriate  for 
the  present  study.  Rather,  previous  efforts  to  develop 
criterion  measures  of  combat  effectiveness  seemed 
more  germane  (22). 

Attempts  to  measure  and  predict  combat  effective¬ 
ness  have  a  long  history  dating  back  to  World  War  II. 
The  interested  reader  is  referred  to  Youngling  et  al.  (22), 
who  conducted  an  extensive  review  of  this  literature.  In 
essence,  there  are  two  problems  that  must  be  ad¬ 
dressed:  first,  the  definition  of  the  criterion  for  combat 
effectiveness;  and  second,  the  search  for  measures  that 
are  predictive  of  that  criterion.  In  general,  four  types  of 
criteria  have  been  used:  objective  outcome  measures 
such  as  kills,  bombing  scores,  etc.;  direct  and  system¬ 
atic  observations  of  mission  performance;  administra¬ 
tive  actions  such  as  failure  to  complete  a  fighter  tour; 
and  qualitative  ratings  of  overall  ability.  On  the  predic¬ 
tor  side,  a  variety  of  potential  indicators  of  combat  ef¬ 
fectiveness  have  been  explored,  including  basic  apti¬ 
tude,  biographical  factors  including  flight  experience,  a 
variety  of  personality  and  motivational  factors,  percep¬ 
tual-motor  abilities,  and  knowledge  and  skills  directly 
related  to  aviation. 

In  general,  only  very  modest  relationships  have  been 
obtained.  Of  the  predictor  sets  that  have  been  evalu¬ 
ated,  those  measures  related  to  previous  flight  experi¬ 
ence  seem  to  be  most  consistently  related  to  combat 
effectiveness  as  measured  by  combat  kills.  Strawbridge 
and  Kahn  (17)  and  Torrance  et  al.  (20)  reported  corre¬ 
lations  with  previous  flight  experience  in  the  range  of 
0.30  to  0.40.  Correlations  with  aptitude  test  scores  and 
other  perceptual-motor  tests  were  substantially  lower, 
with  most  failing  to  reach  statistical  significance.  De¬ 
Leon  (3)  summarized  the  results  of  the  Red  Baron  stud¬ 
ies  that  were  conducted  during  the  Vietnam  conflict. 
Flight  experience,  in  terms  of  total  flight  hours,  total 
fighter  hours,  and  hours  in  the  combat  aircraft,  was 
found  to  be  related  to  combat  success,  although  the 
degree  of  these  relationships  was  fairly  small.  DeLeon 
(3,  p.  16)  concluded  that  “at  best,  experience  appears  to 
be  only  a  vague  measure  of  a  pilot’s  air-to-air  combat 
skills.” 

In  summary,  previous  studies  have  reported  the  high¬ 
est  relationships  between  flight  experience  and  criteria 
involving  actual  combat  success;  i.e.,  kills.  However, 
such  criteria  were  not  available  for  the  present  study; 
nor  was  it  feasible  for  reasons  of  time,  cost,  and  lack  of 
combat  realism  to  gather  data  based  on  actual  mission 


performance  in  the  aircraft  under  the  highly  controlled 
conditions  of  an  instrumented  range  environment,  as 
suggested  by  Youngling  et  al.  (22).  For  practical  rea¬ 
sons,  the  only  alternative  was  to  develop  criterion  mea¬ 
sures  based  upon  human  judgments.  Unfortunately,  the 
use  of  subjective  ratings  of  overall  ability  as  the  crite¬ 
rion  of  combat  effectiveness  has  produced  few  statisti¬ 
cally  significant  relationships  with  predictor  sets  that 
have  been  used  to  date.  For  example,  Lepley  (12)  found 
only  one  significant  correlation  for  his  test  battery  with 
subjective  ratings  of  ability.  Shannon  and  Waag  (15,16) 
met  with  limited  success  in  an  attempt  to  relate  back¬ 
ground  and  experience  factors  to  operational  perfor¬ 
mance.  In  this  case,  squadron  commander  ratings  of 
mission-critical  performance  dimensions  were  the  crite¬ 
rion  measures.  Results  indicated  that  flight  experience 
was  the  best  predictor  of  criterion  performance.  Under¬ 
graduate  Pilot  Training  (UPT)  grades  for  formation  and 
tactics  were  also  found  related  to  such  ratings.  How¬ 
ever,  the  overall  magnitude  of  the  relationship  was 
fairly  low  with  a  multiple  correlation  of  all  background 
and  experience  factors  reaching  only  0.35. 

In  general,  three  types  of  performance  ratings  have 
been  used.  The  most  common,  and  also  most  cited  in 
the  literature,  has  been  supervisory  ratings  (11).  The 
two  other  types  include  peer  ratings  and  self  ratings.  In 
fact,  the  use  of  peer  ratings  for  combat  aviation  dates 
back  to  World  War  II,  when  Jenkins  et  al.  (7)  developed 
a  “combat”  criterion  for  the  U.S.  Navy,  based  largely 
upon  peer  nominations.  In  an  extensive  literature  re¬ 
view,  Landy  and  Farr  (11)  conclude  that  previous  re¬ 
search  studies  have  not  found  very  high  correlations 
among  these  three  types  of  ratings.  Moreover,  it  is  dif¬ 
ficult  to  select  one  approach  as  best  since  the  literature 
does  not  suggest  any  of  these  to  be  more  valid  than  the 
others.  For  these  reasons,  it  was  decided  to  develop 
three  SA  Rating  Scales  (SARS)  and  gather  supervisory, 
peer,  and  self-report  data.  Moreover,  it  was  decided  to 
use  a  simple  graphic  scale  since  the  literature  is  equiv¬ 
ocal  regarding  more  elaborate  procedures  such  as  be- 
haviorally  anchored  rating  scales  (11). 

What  seemed  most  critical,  however,  were  the  actual 
dimensions  that  were  to  be  rated  and  the  development 
of  clear  definitions  for  each.  To  characterize  the  domain 
of  air  combat,  it  was  necessary  first  to  identify  and  de¬ 
scribe  the  critical  activities  required  of  the  fighter  pilot 
to  maintain  good  SA  and  complete  his  mission  success¬ 
fully.  To  this  end,  Houck  et  al.  (10)  conducted  a  cogni¬ 
tive  task  analysis  of  the  attack  portion  of  an  F-15C  air 
combat  mission.  This  analysis  relied  primarily  on  the 
input  of  experienced  fighter  pilots  and  focused  on  crit¬ 
ical  air  combat  task  categories  that  in  previous  research 
were  rated  by  F-15C  pilots  as  being  most  amenable  to 
training  in  air  combat  simulations  (8,9,19).  The  resulting 
analysis  identified  the  significant  types  of  decisions  re¬ 
quired  of  the  flight  members,  the  information  required 
for  making  these  decisions,  and  the  observable  activi¬ 
ties  the  flight  members  performed  to  acquire  this  infor¬ 
mation.  For  the  purposes  of  the  present  research,  this 
classification  provided  a  detailed  description  of  opti¬ 
mum  performance  in  air  combat. 

The  resulting  classification  was  further  analyzed  by 
an  experienced  fighter  pilot  to  derive  those  aspects  of 
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air  combat  operations  judged  most  essential  to  SA.  Par¬ 
amount  in  this  selection  process  was  that  the  items  must 
be  observable  in  the  context  of  day-to-day  squadron 
training  activities  and  subject  to  evaluation  by  other 
fighter  pilots  both  in  terms  of  their  own  performance 
and  that  of  others.  A  further  requirement  was  that  the 
pilots  must  be  able  to  assess  these  items  in  retrospect, 
based  on  performance  observed  to  date.  As  a  result  of 
this  analysis,  24  items  organized  in  7  categories  were 
produced.  Categories  included  tactical  game  plan,  sys¬ 
tem  operation,  communication,  information  interpreta¬ 
tion,  beyond-visual-range  weapons  employment,  visual 
maneuvering,  and  general  tactical  employment.  Be¬ 
cause  the  24  items  were  heavily  weighted  toward  spe¬ 
cific  operational  tasks,  an  additional  7  items  were  in¬ 
cluded  to  reflect  more  general  traits  which  also  were 
hypothesized  to  play  a  role  in  SA.  These  items  were 
based  on  the  study  of  fighter  pilot  combat  effectiveness 
previously  discussed  (22).  Concise  definitions  for  each 
item  were  developed  with  assistance  from  an  experi¬ 
enced  fighter  pilot.  The  resulting  list  and  definitions 
were  reviewed  and  revised  by  several  other  experienced 
pilots  to  ensure  accuracy  and  completeness.  These  31 
items  and  the  8  categories  that  they  represent  are  pre¬ 
sented  in  Table  I  and  form  the  essence  of  the  approach 
taken  to  the  measurement  of  SA  in  the  present  study. 

To  summarize,  the  purpose  of  the  present  investiga¬ 
tion  was  to  develop  a  set  of  tools  for  measuring  SA 
within  an  operational  fighter  environment.  Issues  to  be 
addressed  include:  1)  reliability  of  the  SARS  and  poten¬ 
tial  effects  of  bias  factors  such  as  flight  qualification  of 
the  rater  and  squadron  membership;  2)  inter-rela¬ 
tionships  among  the  supervisory,  peer,  and  self-report 
SARS;  3)  development  of  a  single  composite  SA  score; 
and  4)  external  validity  of  the  composite  SA  score  as 
determined  by  relationships  with  previous  flight  expe¬ 
rience  factors. 

METHODS 

Subjects 

The  subjects  were  205  mission-ready  USAF  F-15C 
pilots  from  8  operational  fighter  squadrons.  Mean,  stan¬ 


dard  deviation,  and  range  of  flight  hours,  respectively, 
were  as  follows:  total  flight  hours  beyond  UPT  (1258, 
744,  202  to  3717)  and  total  flight  hours  in  the  F-15  (668, 
305,  74  to  1823).  Current  flight  qualifications  of  these 
pilots,  in  order  of  increasing  experience  and  profi¬ 
ciency,  included  48  wingmen,  59  2-ship  leads,  32  4-ship 
leads,  and  66  instructor  pilots. 

Materials 

Three  scales  were  developed  to  measure  the  SA  abil¬ 
ity  of  a  pilot  from  three  different  perspectives:  the  self- 
report  SARS,  the  peer  SARS,  and  the  supervisory 
SARS.  Survey  forms  were  custom-designed  and  repro¬ 
duced  through  an  offset  printing  process  to  make  use  of 
computer-based  data  scanning  technology.  Each  survey 
type  was  two  pages:  the  first  page  contained  printed 
instructions,  scale  description,  and  subject  identifica¬ 
tion  codes;  the  second  page,  the  actual  rating  scales. 

For  the  self-report  SARS,  subjects  rated  their  own 
ability  on  each  of  the  31  items  in  comparison  with  other 
F-15C  pilots  using  a  6-point  scale.  End-point  anchors 
ranged  from  a  low  of  “Acceptable,”  since  all  pilots 
were  mission-ready,  to  a  high  of  “Outstanding.”  The 
peer  SARS  required  each  subject  to  rate  all  other  mis¬ 
sion-ready  pilots  in  his  squadron.  Each  pilot  listed  on 
the  peer  SARS  was  rated  on  his  general  fighter  pilot 
ability  and  SA  ability  using  the  same  6-point  scale.  Once 
these  ratings  were  completed,  these  pilots  were  then 
rank-ordered  from  highest  to  lowest  in  terms  of  their  SA 
ability.  A  provision  was  included  on  the  form  for  not 
rating  a  pilot  if  the  rater  felt  he  had  insufficient  knowl¬ 
edge  of  that  particular  individual.  The  supervisory 
SARS  used  the  same  31  items  and  the  6-point  scale  as 
the  self-report  SARS.  Again,  the  reference  was  the  rel¬ 
ative  ability  of  the  ratee  in  comparison  with  other  F-15C 
pilots. 

The  self-report  and  peer  SARS  were  completed  by  all 
subjects  within  the  sample.  The  supervisory  SARS  were 
completed  by  only  a  subset  of  subjects  chosen  to  be 
raters,  based  upon  their  experience  and  supervisory  po¬ 
sitions.  Raters  within  each  squadron  included:  the 
Squadron  Commander,  Ops  (Operations)  Officer,  As- 


TABLE  I.  ITEMS  AND  CATEGORIES  USED  IN  SARS. 


1.  GENERAL  TRAITS 
Discipline 
Decisiveness 
Tactical  knowledge 
Time-sharing  ability 
Spatial  ability 
Reasoning  ability 
Flight  management 

2.  TACTICAL  GAME  PLAN 
Developing  plan 
Executing  plan 
Adjusting  plan  on-the-fly 

3.  SYSTEM  OPERATION 
Radar 

Tactical  electronic  warfare  system 
Overall  weapons  system  proficiency 

4.  COMMUNICATION 

Quality  (brevity,  accuracy,  timeliness) 
Ability  to  effectively  use  information 


5.  INFORMATION  INTERPRETATION 
Interpreting  vertical  situation  display 
Interpreting  threat  warning  system 
Ability  to  use  controller  information 
Integrating  overall  information 
Radar  sorting 

Analyzing  engagement  geometry 
Threat  prioritization 

6.  TACTICAL  EMPLOYMENT-BVR 
Targeting  decisions 

Fire-point  selection 

7.  TACTICAL  EMPLOYMENT-VISUAL 
Maintain  track  of  bogeys/friendlies 
Threat  evaluation 

Weapons  employment 

8.  TACTICAL  EMPLOYMENT-GENERAL 
Assessing  offensiveness/defensiveness 
Lookout 

Defensive  reaction 
Mutual  support 
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sistant  Ops  Officer,  Weapons  Officer,  and  Stan-Eval 
(Standard-Evaluation)  Flight  Examiner  (SEFE)  who 
rated  all  mission-ready  pilots  within  the  Squadron;  and 
the  Flight  Commanders,  who  rated  only  pilots  within 
their  flight  as  well  as  other  Flight  Commanders. 

Procedures 

The  surveys  were  administered  on  location  at  each 
fighter  squadron  base.  An  elaborate  numerical  coding 
procedure  was  followed  to  ensure  the  confidentiality  of 
each  subject’s  data.  The  survey  administrators  briefed 
all  subjects  regarding  the  objectives  of  the  research, 
scale  description  and  item  definitions,  confidentiality 
procedures,  and  instructions  for  completing  the  sur¬ 
veys.  Identification  codes  and  dates  on  each  survey 
were  already  filled  in  for  each  subject  prior  to  adminis¬ 
tration.  Each  subject  removed  the  surveys  enclosed 
within  the  envelope,  completed  them,  returned  them  to 
the  envelope,  and  removed  all  name  labels.  These  labels 
were  given  to  the  test  administrator  who  destroyed 
them,  thus  leaving  no  name  identification  within  or  out¬ 
side  the  envelope. 

Data  regarding  each  subject’s  flight  career  experience 
were  obtained  directly  from  a  computerized  database 
and  through  responses  to  a  background  questionnaire 
administered  to  each  subject.  These  data  included  flight 
hours  and  sorties  by  aircraft  type,  hours  and  sorties  for 
both  combat  and  combat  support  missions,  current 
flight  qualification,  supervisory  responsibilities,  ad¬ 
vanced  fighter  training,  and  participation  in  special 
fighter  exercises  and  training  simulations. 

For  the  self-report  SARS,  nine  summary  scores  were 
produced.  These  included  an  overall  score,  which  was 
the  mean  of  all  31  items,  and  8  category  scores,  which 
were  the  means  of  all  items  within  a  particular  category. 
For  the  supervisory  SARS,  the  same  nine  summary 
scores  were  generated  for  each  ratee  as  follows:  first, 
the  same  nine  summary  scores  were  computed  for  each 
rater’s  assessment  of  each  ratee;  then,  means  for  each 
summary  score  were  computed  across  all  raters  and 
used  as  the  final  nine  supervisory  SA  scores  for  each 
ratee.  For  the  peer  SARS,  three  summary  scores  were 
produced  for  each  ratee  as  follows:  first,  three  scores 
were  generated  by  each  rater  for  each  ratee,  the  ratings 
of  fighter  pilot  ability  and  SA  ability,  and  the  rank  order; 
means  of  these  three  scores  were  then  computed  across 
all  raters  and  used  as  the  final  peer  SA  scores  for  each 
ratee. 

RESULTS 

SARS  reliability:  The  first  set  of  analyses  addressed 
the  reliability  of  the  three  SARS  instruments  and 
checked  for  systematic  biases  due  to  the  current  flight 
qualification  of  the  rater  or  squadron  membership.  Two 
types  of  reliability  were  estimated:  internal  consistency 
and  inter-rater  agreement.  First,  internal  consistency 
was  estimated  for  the  supervisory  and  self-report  SARS 
by  calculation  of  Cronbach’s  coefficient  a.  For  the  su¬ 
pervisory  SARS,  coefficient  a  was  computed  to  be  0.99 
for  all  31  items.  These  results  were  based  on  the  total 
number  of  supervisory  SARS  completed  (N  =  884).  For 
the  self-report  SARS,  a  was  computed  to  be  0.97  for  all 
31  items.  Again,  these  were  based  on  the  total  number 


of  self-report  SARS  completed  (N  =  187).  Second,  in¬ 
ter-rater  agreement  was  estimated  for  the  supervisory 
and  peer  SARS.  Scores  used  in  the  calculation  of  these 
estimates  included  the  nine  scores  from  the  supervisory 
SARS  (eight  category  scores  plus  overall  score)  and  the 
three  scores  from  the  peer  SARS.  Reliability  estimates 
for  each  of  the  12  scores  were  computed  for  each  squad¬ 
ron,  using  an  analysis  of  variance  (ANOVA)  procedure 
(6).  Two  estimates  of  reliability  were  produced,  first  the 
estimated  reliability  for  a  single  rater,  ni,  and  second, 
the  reliability  of  all  raters,  rkk.  For  the  supervisory 
SARS,  the  average  iii,  across  the  eight  squadrons  was 
computed  to  be  0.50.  The  average  rkk  increased  to  0.88. 
For  the  peer  SARS,  average  ni,  was  computed  to  be  0.60 
and  average  rkk  increased  to  0.97.  These  data  clearly 
demonstrate  the  increase  in  the  reliability  of  the  scores 
through  the  addition  of  multiple  raters. 

A  2-factor  ANOVA  was  used  to  determine  any  bias 
effects  in  the  supervisory  and  peer  SARS  due  to  rater 
qualification  and  squadron  membership.  Eleven 
ANOVA’s  were  computed  for  the  nine  supervisory 
SARS  scores  and  two  of  the  peer  SARS  scores,  ratings 
of  fighter  pilot  and  SA  ability.  For  each  effect,  a2  was 
also  computed  as  an  estimate  of  the  strength  of  the  as¬ 
sociation.  For  rater  qualification,  no  significant  effects 
were  found.  For  squadron  membership,  only  the  SA 
ability  rating  produced  a  significant  effect  with  F(7, 151) 
=  2.08,  p  <  0.05.  a2  was  computed  to  be  0.039,  indi¬ 
cating  a  very  small  effect  size.  Moreover,  of  the  28  pair¬ 
wise  comparison  tests,  only  2  reached  significance.  Ad¬ 
ditionally,  no  significant  interaction  effects  were  found 
for  any  of  the  11  scores. 

Similar  analyses  were  conducted  for  the  nine  sum¬ 
mary  scores  from  the  self-report  SARS.  In  this  case, 
however,  it  was  expected  that  significant  effects  would 
occur,  due  to  rater  qualification.  In  other  words,  it  was 
hypothesized  that  the  self-ratings  of  instructor  pilots, 
for  example,  would  be  higher  than  the  self-ratings  of 
inexperienced  wingmen.  The  results  of  the  ANOVA’s 
confirmed  these  expectations.  All  F-ratios  were  statis¬ 
tically  significant  (p  <  0.05).  For  the  overall  score,  to2 
was  computed  to  be  0. 16,  indicating  a  moderate  amount 
of  the  variance  explained  by  this  factor.  Means  for  all 
scores  were  in  accordance  with  expectations  and 
ranked  according  to  qualification  with  IP’s  having  the 
highest  scores  and  followed  in  order  by  4-ship  leads, 
2-ship  leads,  and  wingmen.  Significant  squadron  effects 
were  also  obtained  for  six  of  the  scores  including  the 
overall  score.  Pairwise  comparison  tests  revealed  that 
one  squadron  accounted  for  these  differences  with 
higher  means  for  all  scores.  When  ANOVA’s  were  re¬ 
computed  excluding  data  from  that  squadron,  only  one 
of  the  nine  scores  produced  a  significant  squadron  ef¬ 
fect,  accounting  for  less  than  4%  of  the  variance.  Only 
one  of  the  pairwise  comparison  tests  reached  signifi¬ 
cance. 

SARS  intercorrelations:  The  second  set  of  analyses 
computed  intercorrelations  among  the  three  sets  of 
SARS  scores,  which  are  presented  in  Table  II.  For  the 
sake  of  brevity,  only  correlations  with  the  overall  score 
are  presented  for  both  the  self-report  and  supervisory 
SARS.  The  average  correlation  of  category  scores  with 
the  overall  score  was  computed  to  be  0.95  for  the  su- 
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TABLE  II.  CORRELATIONS  AMONG  SUPERVISORY,  PEER, 
AND  SELF-REPORT  SCORES. 


1 

2 

3  4  5 

1. 

2. 

Supervisory  SARS-Overall  Score 
Peer  SARS-Fighter  Pilot  Ability 

0.89 

3. 

Peer  SARS-SA  Ability 

0.91 

0.98 

— 

4. 

Peer  SARS-Rank  Order 

0.92 

0.91 

0.92  — 

5. 

Self-Report  SARS-Overall  Score 

0.45 

0.56 

0.57  0.49  — 

pervisory  SARS  and  0.86  for  the  self-report  SARS,  in¬ 
dicating  a  high  degree  of  internal  consistency.  All  cor¬ 
relations  were  statistically  significant  (p  <  0.01)  and,  as 
seen  in  Table  II,  the  relationships  among  the  supervi¬ 
sory  and  peer  ratings  were  quite  high.  Although  the 
correlations  of  the  self-report  SARS  with  the  other  rat¬ 
ings  were  positive,  their  magnitude  was  substantially 
lower. 

SARS  composite  score  development:  In  developing  a 
composite  SARS  score,  it  was  decided  to  exclude  the 
self-report  SARS  for  two  reasons.  First,  the  self-report 
SARS  was  significantly  influenced  by  squadron  mem¬ 
bership.  And  second,  only  moderate  correlations  were 
found  with  the  supervisory  and  peer  ratings.  Conse¬ 
quently,  only  the  three  peer  SARS  scores  and  the  eight 
supervisory  SARS  category  scores  were  included  in  the 
development  of  a  single  composite  score.  The  overall 
score  from  the  supervisory  SARS  was  excluded  since, 
mathematically,  it  represented  a  linear  combination  of 
the  category  scores.  A  principal  components  analysis 
was  performed  to  determine  the  underlying  structure  of 
these  scores.  The  first  principal  component  was  found 
to  account  for  92.5%  of  the  total  variance  of  these 
scores,  the  second  component  3.3%,  and  the  remaining 
components  less  than  1%  of  the  variance.  Based  upon 
these  results,  it  was  decided  to  compute  composite 
scores  based  upon  the  first  unrotated  principal  compo¬ 
nent  due  to  the  large  amount  of  variance  it  explained. 
These  scores  were  transformed  to  a  distribution  with 
mean  of  100  and  a  standard  deviation  of  20  for  use  as  the 
composite  SA  score  in  subsequent  analyses. 

Effects  of  previous  experience  on  composite  SARS 
score:  Analyses  were  performed  to  determine  if  the 
composite  SA  score  was  related  to  previous  flight  ex¬ 
perience  information.  It  seemed  reasonable  to  expect 
that  measures  of  experience  such  as  flight  hours,  flight 
qualification,  and  combat  training  exercise  participation 
should  be  related,  to  some  extent,  to  our  composite 
score.  In  fact,  if  such  relationships  were  not  found,  it 
would  seriously  question  the  validity  of  our  composite 
SA  score.  Experience  factors  that  were  analyzed  in¬ 
cluded:  total  flight  time;  total  flight  time  in  the  F-15; 
exercise  participation  (i.e.,  number  attended)  including 
Red  Flag  (0, 1 , 2,  5*3),  Green  Flag  (0,  s*  1)  Maple  Flag  (0, 
5=1)  and  William  Tell  (0,  5=1);  air  combat  simulation 
training  experience  (yes/no)  including  the  McDonnell- 
Douglas  Advanced  Air  Combat  Simulation  (MACAIR) 
and  the  Simulator  for  Air-to-Air  Combat  (SAAC);  Des¬ 
ert  Storm  experience  (yes/no);  and  current  flight  quali¬ 
fication,  including  whether  the  pilot  was  a  Fighter 
Weapons  School  graduate  (yes/no).  Additionally,  the 
effect  of  squadron  membership  was  also  analyzed.  A 
one-way  ANOVA  was  computed  for  each  factor,  except 


for  flight  time.  For  total  flight  hours  and  flight  hours  in 
the  F-15,  correlations  were  computed.  The  results  are 
summarized  in  Table  III. 

As  shown  in  Table  III,  most  of  the  experience  factors 
were  related  to  the  composite  measure  of  SA.  In  fact, 
only  two  of  the  measures  were  not  significantly  related 
to  SA,  participation  during  Desert  Storm  and  previous 
training  in  the  Simulator  for  Air-to-Air  Combat.  It 
should  also  be  noted  that  squadron  membership  had  no 
effect  on  the  composite  SA  score.  In  all  cases,  the  di¬ 
rection  of  the  means  was  such  that  higher  experience 
was  associated  with  better  SA  scores.  In  fact,  some  of 
the  relationships  were  extremely  high.  For  example, 
current  flight  qualification  accounted  for  68%  of  the 
variance  of  the  SA  measure.  These  means  are  presented 
in  Fig.  1.  As  shown,  there  is  a  very  strong  relationship 
with  flight  qualification. 

In  the  final  set  of  analyses,  a  prediction  equation  was 
derived  for  the  composite  SA  score  using  a  combination 
of  background  and  experience  factors.  A  stepwise  re¬ 
gression  analysis  was  performed  with  the  composite  SA 
score  as  the  dependent  variable  and  those  statistically 
significant  background  experience  factors  listed  in  Ta¬ 
ble  III  as  the  potential  set  of  predictor  variables.  A 
“dummy  variable”  coding  scheme  was  employed  to  en¬ 
able  entry  of  flight  qualification  which  is  a  categorical 
variable.  A  4-variable,  “best  fit”  prediction  equation 
was  produced  with  a  multiple  R  of  0.85.  Variables  in¬ 
cluded  in  the  equation,  in  their  order,  were  flight  qual¬ 
ification,  graduation  from  Fighter  Weapons  School,  par¬ 
ticipation  at  Green  Flag,  and  participation  at  Maple 
Flag.  The  overall  multiple  R  was  statistically  significant 
(p  <  0.0001)  as  well  as  the  contribution  of  each  variable 
within  the  equation  (p  <  0.05). 

DISCUSSION 

Three  measurement  tools  were  developed  for  assess¬ 
ing  SA  within  an  operational  fighter  environment.  The 
primary  concerns  with  any  measurement  device  are  its 
reliability,  susceptibility  to  unwanted  bias  factors,  and 
its  validity. 

SARS  reliability:  Reliability  estimates,  in  all  cases, 
were  quite  high.  Estimates  of  internal  consistency  for 
both  the  self-report  and  supervisory  SARS  exceeded 


TABLE  III.  EFFECTS  OF  BACKGROUND  AND  EXPERIENCE 
FACTORS  ON  COMPOSITE  SA  SCORE. 


F-Ratio 

P 

(O2 

Squadron 

0.59 

NS 

0.00 

Flight  Qualification 

128.57 

<0.001 

0.68 

Exercise  Participation 

Red  Flag 

13.55 

<0.001 

0.18 

Green  Flag 

6.15 

<0.01 

0.06 

Maple  Flag 

5.28 

<0.05 

0.02 

William  Tell 

17.05 

<0.001 

0.09 

Fighter  Weapons  Grad 

55.85 

<0.001 

0.24 

Simulator  Experience 

MACAIR 

29.81 

<0.001 

0.14 

SAAC 

1.01 

NS 

0.00 

Desert  Storm  Veteran 

1.54 

NS 

0.00 

F-15  Hours 

0.59* 

<0.001 

0.35 

Total  Flight  Hours 

0.39* 

<0.001 

0.15 

*  Correlations. 
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FLIGHT  QUALIFICATION 

Fig.  1.  Composite  SA  Score  as  a  Function  of  Flight  Qualifica¬ 
tion. 

0.95,  indicating  that  whatever  the  31  items  might  be 
measuring,  they  are  indeed  measuring  it  consistently. 
Of  greater  importance,  however,  are  the  estimates  of 
inter-rater  reliability.  It  was  reasoned  that  both  the  re¬ 
liability  and  validity  of  the  criterion  SA  scores  would  be 
enhanced  if  each  ratee  were  evaluated  by  multiple  rat¬ 
ers.  Consequently,  for  the  supervisory  SARS,  each  ra¬ 
tee  score  was  based  on  an  average  of  from  five  to  eight 
raters.  For  the  peer  SARS,  the  numbers  ranged  from  18 
to  23.  The  results  of  the  reliability  analyses  confirm  the 
value  of  such  an  approach.  The  average  reliabilities 
across  squadrons  obtained  for  a  single  rater  for  both  the 
supervisory  and  peer  SARS  were  marginal.  However,  a 
large  increase  occurred  when  the  average  scores  for  all 
raters  were  used  as  the  estimate.  Although  such  in¬ 
creases  in  reliability  from  use  of  multiple  raters  seem 
intuitive,  the  performance  rating  literature  (11)  has  not 
always  produced  such  effects. 

Sources  of  bias:  In  addition  to  reliability,  there  was 
also  concern  that  the  SARS  might  be  systematically  bi¬ 
ased.  The  two  major  potential  sources  of  such  bias  were 
squadron  membership  and  the  qualification  of  the  rater. 
The  results  indicated  that  no  such  significant  effect  oc¬ 
curred  for  either  the  supervisory  or  peer  SARS.  Of  the 
33  effects  tested,  only  1,  the  peer  rating  for  SA  ability, 
produced  a  statistically  reliable  difference.  However, 
not  much  significance  is  attached  to  that  difference  for 
two  reasons.  First,  the  size  of  the  effect  was  quite  small, 
accounting  for  less  than  4%  of  the  variance.  Second,  the 
peer  rating  for  fighter  ability  produced  no  significant 
difference  despite  the  fact  that  it  was  highly  correlated 
(r  =  0.98)  with  the  peer  rating  of  SA  ability.  For  the 
self-report  SARS,  however,  significant  effects  due  to 
rater  qualification  were  expected,  and  these  were  con¬ 
firmed  by  the  results.  Of  interest  was  the  finding  that 
one  squadron  produced  significantly  higher  ratings. 
When  the  data  from  this  squadron  were  removed,  no 
meaningful  squadron  effect  was  obtained.  Reasons  for 
the  elevated  self-report  SARS  scores  for  the  one  squad¬ 
ron  were  not  apparent. 

Interrelationships  among  SARS:  An  analysis  of  inter¬ 
relationships  among  the  SARS  scores  produced  ex¬ 
tremely  high  correlations  between  the  supervisory  and 
peer  SARS  scores.  Such  magnitude  would  not  have 
been  expected,  based  on  the  previous  literature  (II).  Of 
greater  consistency  with  the  literature  were  the  relation¬ 


ships  with  the  self-report  SARS  scores.  Although  posi¬ 
tive  correlations  were  obtained  between  the  self-report 
SARS  and  the  supervisory  and  peer  SARS,  their  mag¬ 
nitudes  were  significantly  lower.  Moreover,  a  compar¬ 
ison  of  the  overall  means  between  the  supervisory  and 
self-report  SARS  revealed  higher  means  for  the  self- 
report  SARS  scores,  which  is  consistent  with  the  pre¬ 
vious  findings  of  a  “leniency”  effect  of  self-ratings  (11). 

The  high  degree  of  consistency  between  the  supervi¬ 
sory  and  peer  SARS  scores  was  further  confirmed  by 
the  principal  components  analysis  in  which  the  first 
component  accounted  for  over  92.5%  of  the  total  vari¬ 
ance.  The  average  correlation  between  the  eight  cate¬ 
gory  SARS  scores  and  the  first  component  score  was 
0.96.  The  second  component  accounted  for  an  addi¬ 
tional  3.3%  of  the  variance  and  seemed  to  represent 
some  unique  variance  associated  with  the  peer  SARS. 
Correlations  with  the  component  score  were  0.34  and 
0.33  for  fighter  pilot  and  SA  ability,  respectively,  and 
0.19  for  the  ranking.  All  correlations  with  the  supervi¬ 
sory  SARS  scores  were  negative  and  most  (six  of  eight) 
were  not  statistically  significant.  Overall,  these  results 
further  substantiate  the  high  agreement  between  the  su¬ 
pervisory  and  peer  SARS  score  and  the  existence  of  a 
very  large  component  that  can  account  for  most  of  the 
variance.  Although  there  appears  to  be  a  second  com¬ 
ponent  that  is  capturing  some  unique  variance  associ¬ 
ated  with  the  peer  SARS,  its  size  was  quite  small,  and 
consequently  not  used  as  a  criterion  measure  of  SA. 

Effects  of  flight  experience:  At  the  outset,  we  hypoth¬ 
esized  that  there  would  be  positive  relationships  be¬ 
tween  flight  experience  and  the  SA  criterion  measure. 
In  fact,  any  measure  that  was  unrelated  or  negatively 
related  to  flight  experience  would  be  highly  suspect. 
The  results  clearly  supported  our  hypotheses  in  that 
most  of  the  experience  data  produced  positive  relation¬ 
ships  with  the  composite  SA  score.  The  finding  of  pos¬ 
itive  correlations  of  both  total  flight  time  and  time  in  the 
F-15  is  consistent  with  the  earlier  literature,  although 
the  obtained  correlations  were  higher  than  had  been 
reported  previously.  Current  flight  qualification  was  the 
variable  found  most  highly  correlated  with  the  criterion 
SA  measure.  In  fact,  this  variable  alone  accounted  for 
nearly  68%  of  the  criterion  variance.  When  flight  qual¬ 
ification  was  combined  with  other  available  informa¬ 
tion,  a  prediction  equation  could  be  developed  which 
accounted  for  nearly  73%  of  the  criterion  variance, 
which  is  equivalent  to  a  correlation  of  0.85.  These  re¬ 
sults  clearly  indicate  that  the  criterion  measure  of  SA 
developed  for  this  study  can  be  predicted  reasonably 
well  from  readily  available  background  and  flight  expe¬ 
rience  information. 

Interpretation  and  use:  Two  questions  emerge  from 
these  findings.  First,  what  is  actually  being  measured  by 
the  SARS?  And  second,  what  are  implications  for  use  of 
the  composite  SARS  score  as  a  criterion  measure?  An 
inherent  problem  with  most  criterion  measures  is  that 
they  usually  represent  a  “picture”  in  time  (1).  Within 
the  operational  fighter  environment,  pilots  progress  in  a 
fairly  “lock  step”  manner  as  they  move  from  one  flight 
qualification  to  another.  F-15  pilots  begin  their  careers 
in  an  operational  fighter  squadron  by  completing  mis¬ 
sion  qualification  training.  At  that  point  they  are  des- 
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ignated  mission-ready  wingmen.  After  a  certain  number 
of  hours  in  the  jet,  they  become  eligible  for  upgrade  to 
2-ship  lead.  If  successful,  they  gain  experience  (i.e. , 
flight  hours)  and  eventually  become  eligible  for  upgrade 
to  4-ship  lead.  Within  this  process,  a  certain  amount  of 
“selection”  occurs.  If  they  are  judged  not  to  have  the 
requisite  skills  for  upgrade,  their  careers  as  fighter  pilots 
will  usually  end  and  they  will  be  reassigned.  Viewed  in 
this  manner,  it  is  not  surprising  that  current  flight  qual¬ 
ification  is  highly  related  to  our  criterion  measure  of  SA. 
It  is  clear  that  all  raters  (both  supervisors  and  peers) 
were  aware  of  each  ratee’s  flight  qualification  within  the 
squadron.  Such  knowledge  likely  provided  a  good  frame 
of  reference  and  to  some  unknown  extent  may  have 
been  the  basis  for  making  judgments  required  in  the 
SARS.  Consequently,  it  appears  that  the  SARS,  in  large 
part,  measures  what  might  be  termed  an  Air  Force  “op¬ 
erational  management”  view  of  fighter  pilot  skill,  and  as 
such  would  be  highly  correlated  with  flight  experience 
and  current  qualification.  However,  the  criterion  SA 
measure  is  more  than  “experience  only,”  as  indicated 
by  the  number  of  what  might  be  termed  exceptions.  For 
example,  instances  occurred  in  which  an  individual’s 
criterion  SA  score  was  “inconsistent”  with  his  or  her 
qualification  level,  such  as  IP’s  who  received  scores 
more  characteristic  of  wingmen.  And  conversely,  there 
were  some  wingmen  and  2-ship  leads  that  received 
scores  much  higher  than  their  experience  would  sug¬ 
gest. 

Implications  for  use  of  the  composite  SA  score  as  a 
criterion  measure  are  fairly  straightforward.  It  is  clear 
that  effects  due  to  background  and  flight  experience 
must  be  controlled  when  these  scores  are  used  as  crite¬ 
rion  measures.  This  could  be  accomplished  by  partial- 
ling  out  these  effects  and  using  scores  representing  the 
residual  variance  as  measures.  Alternatively,  separate 
analyses  could  be  conducted  for  each  qualification  cat¬ 
egory.  Regardless,  the  fact  remains  that  experience  ac¬ 
counted  for  a  very  large  percentage  of  the  variance 
within  our  criterion  measure.  It  could  be  argued  that, 
within  the  highly  homogenous  and  select  population  of 
the  F-15  fighter  community,  training  and  experience  are 
likely  to  account  for  more  variability  than  ability  differ¬ 
ences.  It  is  expected  that  results  of  the  simulation  phase 
of  this  study  will  shed  additional  light  on  this  issue. 
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